ggplot2-extension-cookbook

This ggplot2 Extension Cookbook aims to provide ggplot2 some extension strategies in a consistent and accessible way. The target audience is fluent users of ggplot2 and R, who haven’t yet entered the extension space. The intent is to provide a lot of examples of extensions to read to grow familiarity and confidence, and also to serve as a reference for actually building extensions.

In that material, I’ll try to stick to a formula to orient you to the ggplot2 extension, so even if the details seem confusing, you’ll know ‘where’ you are at a higher level:

  • Step 0: get job done with ‘base’ ggplot2
  • Step 1: Write a function for the ‘compute’
  • Step 2: Pass the compute to ggproto object
  • Step 3: Pass ggproto to a user-facing function for use in a ggplot() pipeline
  • Step 4: Try out/test/enjoy!

We group the content by extension type, provide demonstrations of their use. Right now, there is a lot of focuses on new geom_* functions. When it comes to excitement about ggplot2 extension packages, new geom_* layers functions really rule the day. See for example 5 powerful ggplot2 extensions, Rapp 2024 in which four of the five focus on new geoms that are made available by packages and ‘Favorite ggplot2 extensions’ slide 38 in C. Scherer’s https://www.cedricscherer.com/slides/RLadiesTunis-2021-favorite-ggplot-extensions.pdf)

Regarding focus on stat_‘s vs. geom_’s functions, I take a geom_* -first approach, because they are more commonly used. I suspect we find geom_* function names to be more concrete descriptions of what the creator envisions for her plot, whereas stat_* function names may feel a be more ’adverbial’ and nebulous in their description of rendered output. Consider that ggplot(mtcars, aes(wt, mpg)) + stat_identity() and ggplot(mtcars, aes(wt, mpg)) + geom_point() create identical plots, but later feels much more descriptive of the resultant plot. Between these two options, the preference for the geom_ is evident in the user data; on Github, there are 788 R language files containing ‘stat_identity’ whereas a staggering 261-thousand R language files contain ‘geom_point’. Of course, stat_* constructions are more flexible and important and the topic is covered later on.

Finally, most of the code is at the ‘R for Data Science’ level, and not ‘Advanced R’ level, which hopefully will afford greater reach. While object oriented programming (OOP) gets top billing in many extension materials, but many folks that are fluent in ggplot2 might not know much about OOP. So, I try to see what can be accomplished with little emphasis on OOP and ggroto.

I think it is important for extenders to recognize that ggplot2 objects are not, of course the rendered plot, but rather a plot specification (of global data, aesthetic mapping, etc) that result from the declarations the user has made. The ggproto OOP mechanism allows users to enter that conversation; making changes to the ggplot2 specification from via their own extensions. The extension style use here, will look different from what you will in general see in the wild; we make it as concise and high-level as possible (and close to ignorable for those put off or nervous about by ggproto methods).

For example defining the object StatCoordinate looks like this:

StatCoordinate <- ggplot2::ggproto(
  `_class` = "StatCoordinate",
  `_inherit` = ggplot2::Stat,
  required_aes = c("x", "y"),
  compute_group = compute_group_coordinates
  )

Currently, I’m experimenting with a ratio typology that you’ll see in the section titles. The idea is to think about how the input data relates to the mark we see on the plot and in turn how the mark’s information is stored in the the ggplot2 object. This is really new, and I’m unsure of how productive or precise it can be…

Overall, I think the resources in this ggplot2 extension cookbook are aligned with the findings in ‘10 Things Software Developers Should Learn about Learning’

Preface and acknowledgements

In January 2020, I attended Thomas Lin Pederson’s talk ‘Extending your ability to extend ggplot2’ seated on the floor of a packed out ballroom. The talk had the important central message - “you can be a ggplot2 extender”. And since then, I wanted to be in that cool-kid extender club. Four years later, I’m at a point where I can start claiming that identity. I hope that this ggplot2 Extension Cookbook will help along you on your extender journey and, especially if you are fluent in R and ggplot2, it says to you “you can be a ggplot2 extender”.

I became a regular ggplot2 user in 2017. I loved how, in general, the syntax was just a simple expression of the core Grammar of Graphics conception of a ‘statistical graphic’ (i.e. data visualization).

A data visualization displays 1) geometries 2) that take on aesthetics (color, size, position, etc) that represent variables 3) from a dataset.

You can learn so much about data via a simple 3-2-1 ggplot2 utterance. And further modifications could be made bit-by-bit, to arrive at the creator’s visual personal preferences.

All of this closely resembles to how you might sketch out a plot on a notepad or blackboard, or describe your data representation decisions to yourself or a colleague. As Thomas Lin Pederson has said, ‘ggplot2 lets you speak your plot into existence’. And perhaps a little less eloquently by Hadley Wickham’s, the ggplot2 author, “This is what I’m thinking; your the computer, now go and do it!”, a paraphrase of the author talking about how he thought data viz should feel as a graduate student statistical consultant – before ggplot2 existed.

But there were pain points when using ‘base’ ggplot2; for me, this was mostly when a geom didn’t exist for doing some compute in the background, and needing compute done over and over. It would be a slough to the compute for a bunch of subsets of interest upstream to the plotting environment. This pre-computation problem felt manageable in classroom setting that I found myself in through early 2020 but when I moved to a primarily analytic role at West Point — where the volume of analysis was simply higher and turn around times faster — I felt the problem much more acutely. (Overnight, I went from weak preference for geom_col - to strong preference for geom_bar!) Extension seemed to offer the solution to the problem, and I was more motivated than ever to break in to it in my analyst role.

I experienced about a year of failure and frustration when first entering the extension space. If I weren’t so convinced of the efficiency gains that it would eventually yield and the elegance of extension, I’d likely have given up. Recognizing the substantial hurdles for even long time R and ggplot2 users, I think there is space for more ggplot2 extension reference materials, such as the recipes in the ggplot2 Extension Cookbook.

I’m grateful for several experiences and the efforts of others that have refined these materials. First, just after getting my own feet wet in extension, I had the chance to work on extension with students in the context of their independent studies. Our focus was the same type of extension that Pederson demonstrated – a geom_* function that used a Stat to do some computational work, and then inherited the rest of its behavior from a more primitive geom.

Working with first and second year undergrad students meant that I got to think about and formulate workflow; how would we build up skills and ideas in way that would be accessibility to rather new R and ggplot2 users. As veterans of just one or two stats classes that used R and ggplot2, what would they find familiar and accessible? What might we be able to de-emphasize? ggproto and OOP in R hadn’t been touched in coursework. Could we still still succeed with extension?

The following steps emerged:

  • Step 0: get job done with ‘base’ ggplot2
  • Step 1: Write a function for the ‘compute’
  • Step 2: Pass the compute to ggproto
  • Step 3: Pass ggproto to stat/geom/facet function
  • Step 4: Try out/test/enjoy!

Taking new R users into the extension space was a leap of faith. But I was very impressed with what the students were able to accomplish during a single semester.

And I also wondered how the strategy would perform with experienced R and ggplot2 users. Being an academic, I wanted to assess further and I went down the route of devising a tutorial [with assistance from independent study student Morgan Brown, who continued to work with me for a second semester] and formally getting feedback on it via focus groups and a survey, after refining the tutorial.

I did some research on ggplot2 extension among ggplot2 and R ‘super users’ and have found that the perhaps this community is under-served, but with the right materials, more folks could get into ggplot2 extension.

I fielded ‘easy geom recipes’ with a group of statistics educators, conducting a survey on the resource and also getting feedback via a focus group.

Among my favorite quotation from the focus groups is something that validated the efforts but also challenged me:

it was … easy. And I felt empowered as a result of that…. But you know, like, my problem isn’t gonna be that easy.

To that participant, I’d say ‘Sometimes it is that easy’. But he is right, that often times I come to an extension problem and am surprised that the strategy that I think is going to work doesn’t, or at least not without a little fiddling.

The feedback on the easy-geom-recipes was collected in March 2023. I presented on the outcomes at the ASA Chapter meeting of COWY, ‘A New Wave of ggplot2 Extenders’.

To try to make those experiences valuable to others, I follow the ‘recipe’ formula as much as possible so that as strategies morph, one still recognizes ‘where we are’ at a high-level in the process.

After presenting on the success of ‘easy geom recipes’, I felt I was at a crossroads. I could either focus on packaging my material as educational, or I could actually write extensions in R packages. The later felt a little more true to my interests, but I felt torn. Happily, I ended up landing a solution where I could have it both ways: writing packages that preserve the story and create recipes along the way. This was enabled by a literate programming mindset generally, and specifically thinly wrapping knitr::knitr_code$get() in my own helper package {readme2pkg}; the functions in {readme2pkg} send code chunks to the appropriate directories for packaging, but live in the README.Rmd as part of the development narrative. (see to {litr} as an alternative to {readme2pkg}). I’m returning to to squarely focus on education in creating this ggplot2 extension cookbook. It has been very easy to pull in material from those packages given their adherence a specific narrative form. In mocking up this book, I’m using code chunk options like child = '../my_gg_experiments/my_extension_ideas.' and code = '../ggwaterfall/R/geom_waterfall'. It is a great help not to have to pull up files and copy and paste. I’m very grateful to Yihui Xie for his insights and efforts at making this possible.

At present, I’ll just show examples of functionality, and then link to the READMEs for further investigation of the specific recipes/strategies used.

I’m personally grateful to other ggplot2 extenders and R enthusiasts that have supported this journey.

I’m also grateful to the ggplot2 development team .

I’m also indebted to my Department of Mathematics and Dean Data Cell colleagues at West Point, for sitting through some talks (some extemporaneous and muddled) where I tried to articulate my ggplot2 extension dreams.

Finally, to Winston Chang, who gets top billing in the ggplot2 extension vignette along with your ggproto, I hope you won’t mind the general approach here which experiments with making ggproto as ignorable as possible for OOP noobs. I also hope to meet you someday and hear more about the early days of ggproto, maybe at ggplot2 extenders meetup as a special guest, perhaps January 2025.

And finally, finally to Hadley Wickham and Leland Wilkinson having incredible insights and acting on them.

Getting started

For best results, I’d recommend diving in by actually creating some geoms as prompted in the ‘easy geom recipes’ tutorial using the rendered tutorial or text .Rmd file. The ‘easy recipes’ contain 3 fully worked examples, and 3 exercises that extend the lessons in the examples.

Having completed these exercises, you’ll have lived geom creations from start to finish, will be well oriented to the consistent patterns I use, to the extent possible, throughout the cookbook.

easy geom_* functions: writing new definitions for where and how of marks on ggplots

This section tackles creating new geom_* layers. The strategy is to look at compute that you’d do without extension capabilities (Step 0), and then create a Stat for that (Step 1 & 2), and then compose a user-facing function, which inherits other behavior from a more primitive geom (Step 3), so that ggplot2 can do compute for you in the background (Step 4).

The section is called easy geoms because these geom functions actually inherit much behavior from more primitive geoms like col, text, point, etc..

geom_text_coordinate: 1:1:1

  • for each row in the input dataframe …
  • we’ll perceive a single mark
  • which will be defined by as single row in the internal dataframe

Step 0: use base ggplot2

library(tidyverse)
library(ggxmean)

cars |>
  mutate(coords = 
           paste0("(", speed, ",", dist, ")")) |>
  ggplot() + 
  aes(x = speed, y = dist) + 
  geom_point() + 
  geom_text(aes(label = coords), 
            check_overlap = T,
            hjust = 0,
            vjust = 0)

Step 0.b But we might like the concise syntax…

ggplot(cars) + 
  aes(x = speed, y = dist) + 
  geom_point() + 
  geom_text_coordinate(
            check_overlap = T,
            hjust = 0,
            vjust = 0)

Step 1: compute

compute_group_coordinates <- function(data, scales) {

data |>
    mutate(label = 
             paste0("(", data$x, ", ", data$y, ")"))
}

cars %>% 
  rename(x = speed, y = dist) %>% 
  compute_group_coordinates() %>% 
  head()
#>   x  y   label
#> 1 4  2  (4, 2)
#> 2 4 10 (4, 10)
#> 3 7  4  (7, 4)
#> 4 7 22 (7, 22)
#> 5 8 16 (8, 16)
#> 6 9 10 (9, 10)

Step 2: pass to ggproto object

StatCoordinate <- ggplot2::ggproto(
  `_class` = "StatCoordinate",
  `_inherit` = ggplot2::Stat,
  required_aes = c("x", "y"),
  compute_group = compute_group_coordinates
  )

Step 3. Write user facing function.

A user-facing geom_* function will use the gggplot2::layer function under the hood. geom_layer is actually an exported function in ggplot2 and can be used directly in ggplot() pipelines as shown below.

# part 3.0 use ggplot2::layer which requires specifying Geom and Stat
ggplot(data = cars) + 
  aes(x = speed, y = dist) + 
  geom_point() + 
  ggplot2::layer(
    stat = StatCoordinate,
    geom = ggplot2::GeomText,
    position = "identity"
    )

For convenience, the layer() function is usually wrapped to have a fixed stat or fixed geom. In geom_text_coordinate, because the use-scope is so narrow, both the stat and geom are ‘hard-coded’ in the layer; i.e. stat and geom are not arguments in the geom_* function.

We do anticipate that the user might want to have control over the data and aesthetic mapping specific to layer (rather than deriving them from global declarations), and therefore make the mapping and data arguments available. Furthermore, the position, show.legend, inherit.aes, and na.rm arguments are made available in the geom as shown below.

# part b. create geom_* user-facing function using g
geom_text_coordinate <- function(mapping = NULL, 
                                 data = NULL,
                                 position = "identity",
                                 show.legend = NA,
                                 inherit.aes = TRUE, 
                                 na.rm = FALSE,
                                 ...) {
  ggplot2::layer(
    stat = StatCoordinate,
    geom = ggplot2::GeomText, 
    position = position,
    mapping = mapping,
    data = data,
    inherit.aes = inherit.aes,
    show.legend = show.legend,
    params = list(na.rm = na.rm, ...)
  )
}

Step 4: Use/test/enjoy

ggplot(data = cars) + 
  aes(x = speed, y = dist) + 
  geom_point() + 
  geom_text_coordinate() 


last_plot() + 
  aes(color = speed > 15)

geom_post: 1:1:1 🚧 this is a nice example. Add!

geom_xy_means: n:1:1

many rows from a dataset: will be summarized and visualized by as single mark: the mark will be defined by one row of data

Step 0. Use base ggplot2

mtcar_xy_means <- mtcars |>
  summarize(wt_mean = mean(wt),
            mpg_mean = mean(mpg))

ggplot(mtcars) + 
  aes(x = wt, y = mpg) + 
  geom_point() + 
  geom_point(data = mtcar_xy_means,
             aes(x = wt_mean, y = mpg_mean),
             size = 8)

Step 1. Write compute function

compute_group_means <- function(data, scales){
  
  data |>
    summarise(x = mean(x),
              y = mean(y))
  
}

Step 2. Define Stat, pasing in compute

StatXymean <- ggplot2::ggproto("StatXymean",
                               ggplot2::Stat,
                               compute_group = compute_group_means,
                               required_aes = c("x", "y")
)

Step 3. Write user-facing function

geom_xy_means <- function(mapping = NULL, 
                          data = NULL,
                          position = "identity", 
                          na.rm = FALSE, 
                          show.legend = NA,
                          inherit.aes = TRUE, ...) {

  ggplot2::layer(
    stat = StatXymean, 
    geom = ggplot2::GeomPoint, 
    data = data, 
    mapping = mapping,
    position = position, 
    show.legend = show.legend, 
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )

}

Step 4. Use/Test/Enjoy

ggplot(mtcars) + 
  aes(x = wt, y = mpg) + 
  geom_point() + 
  geom_xy_means(size = 8)


last_plot() +
  aes(color = am == 1)

geom_chull: N:1:n

This example uses the chull function in R, which ‘computes the subset of points which lie on the convex hull of the set of points specified.’ In layman’s terms if you had a bunch of nails hammered into a board and put a rubber-band around them, the convex hull would be defined by the subset of nails touching the rubberband.

I’m especially excited to include this example, reworked using the Step 0-4 approach, because ultimately looking at the ggplot2 extension vignette on stat_chull and geom_chull was the beginning of layer extension unlocking for me. https://ggplot2.tidyverse.org/articles/extending-ggplot2.html#creating-a-new-stat

Step 0. get it done with ggplot2

library(tidyverse)
chull_row_ids <- chull(mtcars$wt, mtcars$mpg)
chull_row_ids
#>  [1] 17 16 15 24  7 29 21  3 28 20 18
mtcars_chull_subset <- mtcars %>% slice(chull_row_ids)

ggplot(mtcars) + 
  aes(x = wt, y = mpg) + 
  geom_point() + 
  geom_polygon(data = mtcars_chull_subset, 
               alpha = .3, 
               color = "black")

Step 1. Compute

# Step 1
compute_group_c_hull <- function(data, scales){
  
  chull_row_ids <- chull(data$x, data$y)
  
  data %>% slice(chull_row_ids)
  
}

Below, we see that the dataset is reduced to 11 rows which constitute the convex hull perimeter.

mtcars %>% # 32 rows
  rename(x = wt, y = mpg) %>% 
  compute_group_c_hull() # 11 rows
#>                        y cyl  disp  hp drat     x  qsec vs am gear carb
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

Step 2. Pass to ggproto

# Step 2
StatChull <- ggproto(`_class` = "StatChull",
                     `_inherit` = ggplot2::Stat,
                     compute_group = compute_group_c_hull,
                     required_aes = c("x", "y"))

Step 3. Write user-facing geom_/stat_ Function(s)

geom_chull <- function(mapping = NULL, 
                        data = NULL,
                        position = "identity", 
                        na.rm = FALSE, 
                        show.legend = NA,
                        inherit.aes = TRUE, ...) {

  ggplot2::layer(
    stat = StatChull, 
    geom = ggplot2::GeomPolygon, 
    data = data, mapping = mapping,
    position = position, 
    show.legend = show.legend, 
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )

}

Step 4. Try out/test/ enjoy

ggplot(data = mtcars) + 
  aes(x = wt, y = mpg) + 
  geom_point() + 
  geom_chull(alpha = .3)


last_plot() + 
  aes(color = factor(am),
      fill = factor(am))


geom_ggcirclepack: 1:1:n, interdependance new: defining compute_panel in ggproto

a many-row geom for each row of the input data frame, with interdependence between input observations.

Step 0. 🚧 Add how-to w/ base ggplot2

Step 1. Compute

# Step 1
compute_panel_circlepack <- function(data, scales){

  data %>%
    mutate(id = row_number()) ->
    data1

  if(is.null(data$area)){

    data1 %>%
      mutate(area = 1) ->
      data1

  }

  data1 %>%
    pull(area) %>%
    packcircles::circleProgressiveLayout(
      sizetype = 'area') %>%
    packcircles::circleLayoutVertices(npoints = 50) %>%
    left_join(data1) #%>%

}

Step 2. pass to ggproto object

StatCirclepack <- ggplot2::ggproto(`_class` = "StatCirclepack",
                                  `_inherit` = ggplot2::Stat,
                                  required_aes = c("id"),
                                  compute_panel = compute_panel_circlepack,
                                  default_aes = ggplot2::aes(group = after_stat(id))
                                  )

Step 3. pass to user-facing function

geom_circlepack <- function(mapping = NULL, data = NULL,
                           position = "identity", na.rm = FALSE,
                           show.legend = NA,
                           inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatCirclepack, # proto object from Step 2
    geom = ggplot2::GeomPolygon, # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

Step 4. Use/test/enjoy

gapminder::gapminder %>% 
  filter(year == 2002) %>% 
  ggplot() + 
  aes(id = country, fill = pop/1000000) + 
  geom_circlepack()
#> Joining with `by = join_by(id)`


last_plot() + 
  aes(area = pop/1000000) 
#> Joining with `by = join_by(id)`

geom_circle: 1:1:n, 🚧 clarify the reason compute_panel is needed

a single row in a dataframe: will be visualized by a single mark : the mark will be defined by many-row in an internal dataframe

for each row in the dataframe, a single geometry is visualized, but each geom is defined by many rows…

Step 0. Do it with base ggplot2

library(tidyverse)

data.frame(x0 = 0:1, y0 = 0:1, r = 1:2/3) %>% 
  mutate(join_var = 1) %>% 
  mutate(group = row_number()) %>% 
  left_join(tibble(z = 0:15, join_var = 1),
            multiple = "all") %>% 
  mutate(around = 2*pi*z/max(z)) %>% 
  mutate(x = x0 + cos(around)*r,
         y = y0 + sin(around)*r) %>% 
  ggplot() + 
  aes(x, y, label = z) +
  geom_text() +
  geom_path(aes(group = group))
#> Joining with `by = join_by(join_var)`

Step 1. Compute

compute_panel_equilateral <- function(data, scales, n = 15){
  
  data %>% 
    mutate(join_var = 1, 
           group = row_number()) %>% 
  left_join(tibble(z = 0:(n), join_var = 1),
            multiple = "all") %>% 
  mutate(around = 2*pi*z/max(z)) %>% 
  mutate(x = x0 + cos(around)*r,
         y = y0 + sin(around)*r) 
  
}

tibble(x0 = 1:2, y0 = 1:2, r = 1 ) %>% 
  compute_panel_equilateral()
#> Joining with `by = join_by(join_var)`
#> # A tibble: 32 × 9
#>       x0    y0     r join_var group     z around      x     y
#>    <int> <int> <dbl>    <dbl> <int> <int>  <dbl>  <dbl> <dbl>
#>  1     1     1     1        1     1     0  0     2      1    
#>  2     1     1     1        1     1     1  0.419 1.91   1.41 
#>  3     1     1     1        1     1     2  0.838 1.67   1.74 
#>  4     1     1     1        1     1     3  1.26  1.31   1.95 
#>  5     1     1     1        1     1     4  1.68  0.895  1.99 
#>  6     1     1     1        1     1     5  2.09  0.5    1.87 
#>  7     1     1     1        1     1     6  2.51  0.191  1.59 
#>  8     1     1     1        1     1     7  2.93  0.0219 1.21 
#>  9     1     1     1        1     1     8  3.35  0.0219 0.792
#> 10     1     1     1        1     1     9  3.77  0.191  0.412
#> # ℹ 22 more rows

Step 2. Pass to ggproto

StatCircle <- ggproto(
  `_class` = "StatCircle", 
  `_inherit` = ggplot2::Stat,
  compute_panel = compute_panel_equilateral,
                      required_aes = c("x0", "y0", "r")
                      )

Step 3. Write geom_* or stat_*

geom_circle <- function(
  mapping = NULL,
  data = NULL,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatCircle,  # proto object from Step 2
    geom = ggplot2::GeomPolygon,  # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

Step 4: Enjoy (test)

data.frame(x0 = 0:1, y0 = 0:1, r = 1:2/3) %>% 
  ggplot() + 
  aes(x0 = x0, y0 = y0, r = r) + 
  geom_circle() + 
  aes(fill = r)
#> Joining with `by = join_by(join_var)`


diamonds %>% 
  slice_sample(n = 80) %>% 
  ggplot() + 
  aes(x0 = cut %>% as.numeric, y0 = carat  , r = clarity %>% as.numeric()/20) + 
  geom_circle(alpha = .2) + 
  aes(fill = after_stat(r)) +
  coord_equal()
#> Joining with `by = join_by(join_var)`


cars %>% 
  ggplot() + 
  aes(x0 = speed, y0 = dist, r = speed/dist) + 
  geom_circle()
#> Joining with `by = join_by(join_var)`

  
cars %>% 
  sample_n(12) %>%  
  ggplot() + 
  aes(x0 = speed, y0 = dist, r = 3) + 
  geom_circle(color = "black") +
  coord_equal()
#> Joining with `by = join_by(join_var)`


last_plot() + 
  aes(alpha = speed > 15) +
  aes(linetype = dist > 20) +
  aes(fill = speed > 18) +
  facet_wrap(~ dist > 40)
#> Warning: Using alpha for a discrete variable is not advised.
#> Joining with `by = join_by(join_var)`
#> Joining with `by = join_by(join_var)`

Why not compute_group?

StatCircle2 <- ggproto(
  `_class` = "StatCircle2",
  `_inherit` = ggplot2::Stat,
  compute_group = compute_panel_equilateral,
  required_aes = c("x0", "y0", "r"))

geom_circle_CG <- function(
  mapping = NULL,
  data = NULL,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatCircle2,  # proto object from Step 2
    geom = ggplot2::GeomPolygon,  # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

cars %>% 
  sample_n(12) %>%  
  ggplot() + 
  aes(x0 = speed, y0 = dist, r = 3) + 
  geom_circle_CG(color = "black") +
  coord_equal() + 
  aes(alpha = speed > 15) +
  aes(linetype = dist > 20) +
  aes(fill = speed > 18) +
  facet_wrap(~ dist > 40)
#> Warning: Using alpha for a discrete variable is not advised.
#> Joining with `by = join_by(join_var)`
#> Joining with `by = join_by(join_var)`
#> Joining with `by = join_by(join_var)`

geom_state: 1:1:n

Step 0: use base ggplot2

tibble(state.name) %>% 
  mutate(ind_vowel_states = 
           str_detect(state.name, "A|E|I|O|U")) ->
states_characteristics

states_characteristics %>% 
  rename(state_name = state.name) %>% 
  left_join(ggstates::state_reference_full) %>% 
  sf::st_as_sf() %>% 
  ggplot() + 
  geom_sf() +
  aes(fill = ind_vowel_states)
#> Joining with `by = join_by(state_name)`

Step 1: Write compute function 🚧

code = readlines_wo_roxygen(x = "../ggstates/R/geom_state.R")

Step 3. Pass to user-facing function.. 🚧

Step 4. Use/Test/Enjoy 🚧

ggplot(data = states_characteristics) + 
  aes(state_name = state.name) +
  ggstates::geom_state()

geom_ols: n:k:w; interdependence

between-group computation

geom_county: 1:1:1 via geometry sf

a geom defined by an sf geometry column

Step 0 🚧 add example

library(ggnorthcarolina)

Step 1, 2, 3. compute 🚧 want to see if xmin, xmax columns can be added within compute using ggplot2 function; more to figure out with CRSs

################# Step 1. Compute panel function ###########

compute_county_northcarolina <- function(data, scales, keep_county = NULL){

  reference_filtered <- northcarolina_county_reference
  #
  if(!is.null(keep_county)){

    keep_county %>% tolower() -> keep_county

    reference_filtered %>%
      dplyr::filter(.data$county_name %>%
                      tolower() %in%
                      keep_county) ->
      reference_filtered

  }
#
#   # to prevent overjoining
#   reference_filtered %>%
#     dplyr::select("fips",  # id columns
#                   "geometry",
#                   "xmin","xmax",
#                   "ymin", "ymax") ->
#     reference_filtered


  data %>%
    dplyr::inner_join(reference_filtered) #%>% # , by = join_by(fips)
    # dplyr::mutate(group = -1) %>%
    # dplyr::select(-fips) #%>%
    # sf::st_as_sf() %>%
    # sf::st_transform(crs = 5070)

}


###### Step 2. Specify ggproto ###############

StatCountynorthcarolina <- ggplot2::ggproto(
  `_class` = "StatCountynorthcarolina",
  `_inherit` = ggplot2::Stat,
  compute_panel = compute_county_northcarolina,
  default_aes = ggplot2::aes(geometry = ggplot2::after_stat(geometry)))


########### Step 3. geom function, inherits from sf ##################

geom_county <- function(
      mapping = NULL,
      data = NULL,
      position = "identity",
      na.rm = FALSE,
      show.legend = NA,
      inherit.aes = TRUE,
      crs = "NAD27", # "NAD27", 5070, "WGS84", "NAD83", 4326 , 3857
      ...) {
            c(ggplot2::layer_sf(
              stat = StatCountynorthcarolina,  # proto object from step 2
              geom = ggplot2::GeomSf,  # inherit other behavior
              data = data,
              mapping = mapping,
              position = position,
              show.legend = show.legend,
              inherit.aes = inherit.aes,
              params = rlang::list2(na.rm = na.rm, ...)),
              coord_sf(crs = crs,
                       default_crs = sf::st_crs(crs),
                       datum = crs,
                       default = TRUE)
            )
  }

Step 4. Use/test/enjoy

ggnorthcarolina::northcarolina_county_flat %>% 
  ggplot() + 
  aes(fips = fips) + 
  geom_county() 
#> Joining with `by = join_by(fips)`


last_plot() + 
  aes(fill = SID74/BIR74)
#> Joining with `by = join_by(fips)`

Step 2.

geom_candlestick summarize first, then interdependence …

geom_pie: n -> 1:1:1

code = readlines_wo_roxygen("../ggwedge/R/compute_panel_pie.R")

geom_wedge: n -> 1:1:n

stat_* layers: keeping flexible via stat_* functions

stat_chull

Rather than defining geom functions, you might instead write stat_* functions which can be used with a variety of geoms. Let’s contrast geom_chull and stat_chull below.

geom_chull <- function(mapping = NULL, 
                        data = NULL,
                        position = "identity", 
                        na.rm = FALSE, 
                        show.legend = NA,
                        inherit.aes = TRUE, ...) {

  ggplot2::layer(
    stat = StatChull, 
    geom = ggplot2::GeomPolygon, 
    data = data, mapping = mapping,
    position = position, 
    show.legend = show.legend, 
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )

}


stat_chull <- function(mapping = NULL, 
                       geom = ggplot2::GeomPolygon, 
                       data = NULL,
                       position = "identity", 
                       na.rm = FALSE, 
                       show.legend = NA,
                       inherit.aes = TRUE, ...) {

  ggplot2::layer(
    stat = StatChull, 
    geom = geom, 
    data = data, 
    mapping = mapping,
    position = position, 
    show.legend = show.legend, 
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )

}

The construction is almost identical. However, in the stat version, the geom is flexible because it can be user defined, instead of being hard-coded in the function. Its use allows you to go in different visual directions, but might have a higher cognitive load.

p <- ggplot(data = mtcars) + 
  aes(x = wt, y = mpg) + 
  geom_point() 

p +
  stat_chull(alpha = .3)


p +
  stat_chull(geom = "point",
             color = "red",
             size = 4)


p + 
  stat_chull(geom = "text",
             label = "c-hull point",
             hjust = 0)


# shows stat does not well-serve "path" geom
p + 
  stat_chull(geom = "path",
             label = "c-hull point",
             hjust = 0)
#> Warning in stat_chull(geom = "path", label = "c-hull point", hjust = 0):
#> Ignoring unknown parameters: `label` and `hjust`

stat_waterfall: 1:1:1; interdependence

Because the stat_* functions might require more cognitively from the user, aliasing might be a good idea, creating one or more geoms_* the stat layer.

The function geom_waterfall() is exported from the ggwaterfall package, which is being developed at github.com/EvaMaeRey/ggwaterfall

One-row geom for each row in input dataset; geom interdependence

A waterfall plot displays inflows and outflows that occur as a result of events as well as the balance across multiple events. It is typically displayed as a series of rectangles. Because the net change is displayed (cumulative change), there is interdependence between the geometries on our plot – where one rectangle ends, the next in the series begins.

In this example we’ll see how to alias the stat to a geom user-facing function (stat_waterfall -> geom_waterfall), and also how to change the geom to allow for additional convenient user-facing functions (stat_waterfall -> geom_waterfall_label). We prep to create geom_waterfall label by using the default_aes slot in in the ggproto step.

library(ggwaterfall)
library(tidyverse)
flow_df <- data.frame(
  event = c("Sales", "Refunds", "Payouts", "Court Losses", 
            "Court Wins", "Contracts", "Fees"),
           change = c(6400, -1100, 
                      -100, -4200, 3800, 
                      1400, -2800)) |> 
  mutate(event = factor(event))

flow_df
#>          event change
#> 1        Sales   6400
#> 2      Refunds  -1100
#> 3      Payouts   -100
#> 4 Court Losses  -4200
#> 5   Court Wins   3800
#> 6    Contracts   1400
#> 7         Fees  -2800

flow_df |> 
  ggplot() +
  geom_hline(yintercept = 0) +
  aes(change = change, 
      x = event) + # event in order
  geom_waterfall() + 
  geom_waterfall_label() + 
  scale_y_continuous(expand = expansion(.1)) + 
  scale_fill_manual(values = c("springgreen4", "darkred"))

The strategy to create geom waterfall follows the standard four steps.

Step 0

For ‘step 0’, we base ggplot2 to accomplish this task, and actually pretty closely follow Hadley Wickham’s short paper that tackles a waterfall plot with ggplot2. https://vita.had.co.nz/papers/ggplot2-wires.pdf

Steps 1 and 2

Then, we bundle up this computation into a function (step 1), called compute_panel_waterfall. This function then define the compute_panel element in the ggproto object (step 2). We want the computation done panel-wise because of the interdependence between the events, which run along the x axis. Group-wise computation (the defining compute_group element), would fail us, as the cross-event interdependence would not be preserved.

compute_panel_waterfall <- function(data, scales, width = .90){
  
  data %>% 
  mutate(x_scale = x) %>% 
  mutate(x_pos = x %>% as.numeric()) %>% 
  arrange(x_pos) %>% 
  mutate(balance = cumsum(c(0, 
                            change[-nrow(.)]))) %>% 
  mutate(direction = factor(sign(change))) %>% 
  mutate(xmin = x_pos - width/2,
         xmax = x_pos + width/2,
         ymin = balance,
         ymax = balance + change) %>% 
  mutate(x = x_pos) %>% 
  mutate(y = ymax) %>% 
  mutate(gain_loss = ifelse(direction == 1, "gain", "loss"))
  
}


### Step 1.1 Test compute 

# flow_df %>% 
#   rename(x = event) %>% 
#   compute_panel_waterfall() 



## Step 2. Pass compute to ggproto 

StatWaterfall <- ggplot2::ggproto(`_class` = "StatWaterfall", 
                         `_inherit` = ggplot2::Stat,
                         required_aes = c("change", "x"),
                         compute_panel = compute_panel_waterfall,
                         default_aes = ggplot2::aes(label = ggplot2::after_stat(change),
                                           fill = ggplot2::after_stat(gain_loss),
                                           vjust = ggplot2::after_stat((direction == -1) %>%
                                                                as.numeric)))

Step 3

In step 3, we define stat_waterfall, passing along StatWaterfall to create a ggplot2 layer function. We include a standard set of arguments, and we set the geom to ggplot2::GeomRect.

stat_waterfall <- function(geom = ggplot2::GeomRect, 
  mapping = NULL,
  data = NULL,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatWaterfall,  # proto object from step 2
    geom = geom,  # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

geom_waterfall <- stat_waterfall


geom_waterfall_label <- function(..., lineheight = .8){
  stat_waterfall(geom = "text", 
                 lineheight = lineheight, ...)}

Step 4

In Step 4, we get to try out the functionality.

flow_df |> 
  ggplot() +
  geom_hline(yintercept = 0) +
  aes(change = change, 
      x = event) + # event in order
  geom_waterfall() + 
  geom_waterfall_label() + 
  scale_y_continuous(expand = expansion(.1)) + 
  scale_fill_manual(values = c("springgreen4", "darkred"))

last_plot() + 
  aes(x = fct_reorder(event, change))

last_plot() + 
  aes(x = fct_reorder(event, abs(change)))

The final plot shows that while there are some convenience defaults for label and fill, these can be over-ridden.

last_plot() + 
  aes(label = ifelse(change > 0, "gain", "loss")) + 
  aes(fill = NULL)

borrowing compute

geom_smoothfit: 1:1:1 ggproto piggybacking on compute…

n:1:80 is geom_smooth default.

ggplot(data = mtcars) + 
  aes(x = wt, y = mpg) + 
  geom_point() + 
  geom_smooth() +
  stat_smooth(xseq = mtcars$wt, 
              geom = "point",
              color = "blue")
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

### Step 1. compute

compute_group_smooth_fit <- function(data, scales, method = NULL, formula = NULL,
                           se = TRUE, n = 80, span = 0.75, fullrange = FALSE,
                           level = 0.95, method.args = list(),
                           na.rm = FALSE, flipped_aes = NA){
  
  
  out <- ggplot2::StatSmooth$compute_group(data = data, scales = scales, 
                       method = method, formula = formula, 
                       se = FALSE, n= n, span = span, fullrange = fullrange,
                       xseq = data$x, 
                       level = .95, method.args = method.args, 
                       na.rm = na.rm, flipped_aes = flipped_aes) 
  

  out$x_obs <-  data$x
  out$y_obs <- data$y
  
  out$xend <- out$x_obs
  out$yend <- out$y_obs
  
  out
  
}

Step 2

Step 3

geom_smooth_predict <- function(xseq,  mapping = NULL, data = NULL, ..., method = NULL, formula = NULL, se = TRUE, method.args = list(), na.rm = FALSE, orientation = NA, show.legend = NA, inherit.aes = TRUE, color = "blue"){
  
  stat_smooth( mapping = mapping, data = data, geom = "point", position = "identity", xseq = xseq,  ..., method = method, formula = formula, se = se, method.args = list(), na.rm = na.rm, orientation = orientation, show.legend = show.legend, inherit.aes = inherit.aes, color = color
)
  
}

add default aesthetics

geom_barlab: Adding defaults to existing stats via ggproto editing

modified start points; ggverbatim(),

ggverbatim()

ggverbatim <- function(data, cat_cols = 1,  row_var_name = NULL, cols_var_name = "x", value_var_name = NULL){

  message("Variables that represented visually are ; e.g.  aesthetic mappying are 'x', and " |> paste(row_var_name))

  row_var_name <- names(data)[1]
  names(data)[1] <- "row_var"

  col_headers <- names(data)
  col_headers <- col_headers[2:length(col_headers)]

  data %>%
    mutate(row_var = fct_inorder(row_var)) %>%
    pivot_longer(cols = -cat_cols) %>%
    mutate(name = factor(name, levels = col_headers)) %>%
    rename(x = name) ->
  pivoted

  pivoted %>%
    ggplot() +
    aes(x = x) +
    labs(x = cols_var_name) +
    aes(y = row_var) +
    labs(y = row_var_name) +
    aes(label = value) +
    aes(fill = value) +
    scale_x_discrete(position = "top") +
    scale_y_discrete(limits=rev)

}

ggedgelist()

# get into ggplot2 plot space from edge list data frame 
ggedgelist <- function(edgelist, nodelist = NULL, ...)(
  
  # message("'name' a variable created in the 'nodes' dataframe")
  
    if(is.null(nodelist)){
    edgelist %>% 
    tidygraph::as_tbl_graph() %>% 
    ggraph::ggraph(...) 
    
  }else{ # join on nodes attributes if they are available
    
    names(nodelist)[1] <- "name"
    
    edgelist %>% 
    tidygraph::as_tbl_graph() %>%
    dplyr::full_join(nodelist) %>% 
    ggraph::ggraph(...) 
    
  }
  
)

# get a fill viz w edgelist dataframe only
ggedgelist_quick <- function(edgelist, nodelist = NULL, include_names = F,  ...){
  

  p <- ggedgelist(edgelist = edgelist,
                  nodelist = nodelist, ...) +
  ggraph::geom_edge_link(color = "orange") +
  ggraph::geom_node_point(size = 9,
                  color = "steelblue",
                  alpha = .8) 
  
  if(include_names){p + ggraph::geom_node_label(aes(label = name))}else{p}
  
}

geom_node_label_auto <- function(...){ 
  
  ggraph::geom_node_label(aes(label = name), ...)
  
}

geom_node_text_auto <- function(...){ 
  
  ggraph::geom_node_text(aes(label = name), ...)
  
}

theme_chalkboard()


geoms_chalk <- function(color = "lightyellow", fill = color){

  # https://stackoverflow.com/questions/21174625/ggplot-how-to-set-default-color-for-all-geoms

  ggplot2::update_geom_defaults("point",   list(colour = color, size = 2.5, alpha = .75))
  ggplot2::update_geom_defaults("segment",   list(colour = color, size = 1.25, alpha = .75))
  ggplot2::update_geom_defaults("rug",   list(colour = color, size = 1, alpha = .75))
  ggplot2::update_geom_defaults("rect",   list(colour = color, size = 1, alpha = .75))
  ggplot2::update_geom_defaults("label",   list(fill = fill, color = "grey35", size = 5))

  # params <- ls(pattern = '^geom_', env = as.environment('package:ggxmean'))
  # geoms <- gsub("geom_", "", params)
  #
  # lapply(geoms, update_geom_defaults, list(colour = "oldlace"))
  # lapply(geoms, update_geom_defaults, list(colour = "oldlace"))

}

theme_chalkboard <- function(board_color = "darkseagreen4", chalk_color = "lightyellow"){

  list(
    ggplot2::theme(rect = ggplot2::element_rect(fill =
                                                  board_color)),
    ggplot2::theme(text = ggplot2::element_text(color = chalk_color,
                                                face = "italic",
                                                size = 15)),
    ggplot2::theme(panel.background =
                     ggplot2::element_rect(fill = board_color)),
    ggplot2::theme(legend.key = ggplot2::element_blank()),
    ggplot2::theme(legend.title = ggplot2::element_blank()),
    ggplot2::theme(axis.text =
                     ggplot2::element_text(color = chalk_color)),
    ggplot2::theme(axis.ticks =
                     ggplot2::element_line(color = chalk_color)),
    ggplot2::theme(panel.grid = ggplot2::element_blank())
  )

}

theme_chalkboard_slate <- function(){

  theme_chalkboard("lightskyblue4", "honeydew")

}

geom-led extension

ggscatterplot

make it a package: ggtedious formal testing

This is a placeholder for the ggtedious workshop, yet to be completed.

https://github.com/EvaMaeRey/ggtedious

#library(ggtedius)